Feature Reduction for High-Precision Text Classifi- cation

نویسندگان

  • Yi-Xian Lin
  • Been-Chian Chien
چکیده

Processing high dimensional features is the key of documents analysis and text classification. Traditional technologies for selecting or extracting rely heavily on the distribution of term features in the set of documents. It generally needs high computation cost to find the significant features. In this paper, we propose a new feature reduction method based on the analysis of discriminant coefficient for text classification. The main term features called the log-scaled discriminant coefficients are designed and generated efficiently. The original feature dimension of term frequency is then converted into a small size feature set. The reduced document features are used to learn classifiers using support vector machine. The experimental results show that the proposed feature reduction approach is effectiveness in comparison with the other methods. Especially, there is an obviously great improvement in precision of text classification.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Sentiment Quantifi cation

These articles present unique innovative research, computational methods, and selected results and examples. In “Sentiment Quantifi cation,” Andrea Esuli and Fabrizio Sebastiani argue that the opinionmining community has traditionally neglected whether the analysis of large quantities of text should be carried out at the individual or aggregate level. They review several sentimentquantifi catio...

متن کامل

Multivariate Classifi cation for Qualitative Analysis

Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83 Principles of classifi cation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84 The classes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84 Main categories of classifi cation methods ...

متن کامل

On Multivariate Methods in Robust Econometrics

This work studies implicitly weighted robust statistical methods suitable for econometric problems. We study robust estimation mainly for the context of heteroscedasticity or high dimension, which are up-to-date topics of current econometrics. We describe a modifi cation of linear regression resistant to heteroscedasticity and study its computational aspects. For a robust version of the instrum...

متن کامل

Classi cation of News Stories Using Support Vector Machines

Given a data set and a data mining task such as classiication, there are two main reasons for performing feature space reduction. The rst is to improve the accuracy of the algorithm. In a domain such as text mining, the common technique of parameterizing each document as a vector of words results in thousands of dimensions. The performance of many learning algorithms decreases as the dimensiona...

متن کامل

Proposing the novelty classifi er for face recognition

Introduction: Face recognition, one of the most explored themes in biometry, is used in a wide range of applications: access control, forensic detection, surveillance and monitoring systems, and robotic and human machine interactions. In this paper, a new classifi er is proposed for face recognition: the novelty classifi er. Methods: The performance of a novelty classifi er is compared with the...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2011